## Installing package into '/home/pablo/R/x86_64-pc-linux-gnu-library/3.6'
## (as 'lib' is unspecified)

Project Data Description

In this project, we analyze 3,202 stock price and volume data time series traded on the NASDAQ exchange between May 30th and October 30th, 2019. This date range was selected for its distance from significant biological and political disruption to the markets, which can both introduce artificial seasonality and increased random variation into forecasts. Data was sourced as comma-separated values through API from AlphaVantage.

Because of the time constraints involved with directly analyzing each stock, we developed a loop to process through each file, perform a linear model on each stock and where the slope for price is positive and greater than 0.04 and the price is within an affordable range - between 5 and 50 dollars per share - we then select that stock to assess if the spectral density indicates any wandering behavior based on a peak at zero and no additional peaks thereafter. Because of this wandering, the sample ACFs were also expected to damp exponentially, indicative of non-stationary behavior. This method allowed us to identify ideal stocks for signal-plus-noise modeling with postive, profitable trending. This method provided us 7 stocks we deemed useful, from which we selected one. The one we chose appeared to provide the most stationary noise around the signal.

This method uses linear regression to identify a profitable signal-plus-noise model fit. Cochran-orcutt is needed after selection of stocks.

Data Selection

plotts.sample.wge(df$low, trunc=25, arlimits=T)
files = list.files(path='../Time-Series-Stocks', pattern='*.csv')

for(file in files){
  actualFile <- paste0('../Time-Series-Stocks/',file)
  
  df <- read.csv(actualFile, header=T, strip.white=T)

  # run linear regression to get the signal (average).
  t = seq(1, nrow(df),1)
  fit = lm(df$low~t)
  
  # get the frequency values from the spectral density in the Parzen Window (we want them to wander without season; just trend)
  dbz <- plotts.sample.wge(df$low)[4] # plotting sample plots to see the stocks while they process

  # if the linear coefficient (deterministic signal) for the price is positive and the price is between 5 and 50 (affordable):
  if(fit$coefficients[2] > 0.04 && df$low[nrow(df)] > 5 && df$low[nrow(df)] < 50){
    for(i in 1:(length(dbz$dbz)-1)){
      # if the realization is wandering (based on spectral density):
      if(dbz$dbz[i] > dbz$dbz[i+1]){
        write.table(df$symbol, './models_aic_less_than_0.csv', append=T)
      }
    }
  }
}

This is the stock we discovered. Candlestick chart for fun.

Because the variation within high, low, open, and close prices can indicate stability of performance of a stock in terms of stationarity, we decided to include a ratio of box-to-tail dimensions

urlfile = "https://raw.githubusercontent.com/PaulAdams4361/Time-Series-Project/master/NASDAQ_Daily_ACGL.csv"
df <- read_csv(url(urlfile))
## Parsed with column specification:
## cols(
##   times = col_date(format = ""),
##   open = col_double(),
##   high = col_double(),
##   low = col_double(),
##   close = col_double(),
##   volume = col_double(),
##   symbol = col_character(),
##   market = col_character()
## )
df <- data.frame(Date=df$times, coredata(df[,2:5]))

fig <- df %>% plot_ly(x = ~Date, type="candlestick",
          open = ~open, close = ~close,
          high = ~high, low = ~low) 
fig <- fig %>% layout(title = "Candlestick Chart for ACGL")

fig

Candlestick Chart for ACGL

For example, during an uptrend, if a trader is watching for price bars dropping below a moving average (MA), then using the low of each candle as the input for the MA may make more sense than using the close. https://www.thebalance.com/average-of-the-open-high-low-and-close-1031216

stock_data <- function(x){
  urlfile="https://raw.githubusercontent.com/PaulAdams4361/Time-Series-Project/master/NASDAQ_Daily_ACGL.csv"
  df <- read_csv(url(urlfile))
  df$volume <- (df$volume/10000)
  HiLo <- df$high - df$low
  HiClo <- df$high - df$close
  OpHi <- df$open - df$high
  OpClo <- df$open - df$close
  OpLo <- df$open - df$low
  CloLo <- df$close - df$low
  varianceRatio <- (df$open - df$close) / (df$high - df$low) * 100
  df <- data.frame(cbind(df, varianceRatio, HiLo, HiClo, OpHi, OpClo, OpLo, CloLo))
  return(df)
}

df <- stock_data()
## Parsed with column specification:
## cols(
##   times = col_date(format = ""),
##   open = col_double(),
##   high = col_double(),
##   low = col_double(),
##   close = col_double(),
##   volume = col_double(),
##   symbol = col_character(),
##   market = col_character()
## )
# head(df)

Signal-Plus-Noise Model

In this Signal-Plus-Noise model, we perform a hypothesis test on the linear trend to identify using OLS if the trend is possibly deterministic. After positive confirmation from OLS, we then tested this with the Cochrane-Orcutt AR(1) based hypothesis test, which accounts for serial correlation in determining slope significance. This test also confirmed the signal is a deterministic trend.

Next, we removed the residuals from the trend line and built a model for this data. We then tested the residuals for white noise variance using the Ljung-Box test, which indicated there is not enough evidence to consider the residuals to be more than white noise. Because of the stationarity of the residuals, we were able to estimate a model using the linear slope and adding to it the variation of the residuals.

Forecasting error was measured in terms of Average Squared Error using the last trading week’s data points, for which there were five. The ASE was 0.1654078.

df <- stock_data()
## Parsed with column specification:
## cols(
##   times = col_date(format = ""),
##   open = col_double(),
##   high = col_double(),
##   low = col_double(),
##   close = col_double(),
##   volume = col_double(),
##   symbol = col_character(),
##   market = col_character()
## )
# take a sample of the data, analyze
# plotts.sample.wge(df$low, arlimits=T)

####################
# Signal-Plus-Noise
####################
x = df$low
n = length(x)
t = 1:n
fit = lm(x ~ t)
summary(fit) # there appears to be deterministic trend based on OLS; the p-value is significant so reject the null that it is not
## 
## Call:
## lm(formula = x ~ t)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -1.10389 -0.52398 -0.02693  0.34676  1.35981 
## 
## Coefficients:
##              Estimate Std. Error t value Pr(>|t|)    
## (Intercept) 34.855438   0.123444  282.36   <2e-16 ***
## t            0.072922   0.002122   34.36   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.6126 on 98 degrees of freedom
## Multiple R-squared:  0.9234, Adjusted R-squared:  0.9226 
## F-statistic:  1181 on 1 and 98 DF,  p-value: < 2.2e-16
# deterministic. The null argues that any present trend is random that will eventually traverse such a pattern that the realization 
# will continue to approximate around the mean. 

# Because OLS is not robust to non-stationarity, we apply the Cochrane-Orcutt test to also test the beta coefficient slope of time
# using an aproach that fits an AR(1) model to the residuals:
cfit = cochrane.orcutt(fit) # to confirm with chochrane-orcutt
summary(cfit) # Cochran-Orcutt also provides a significant p-value. Based on this, we assume the slope is not equal to zero and
## Call:
## lm(formula = x ~ t)
## 
##               Estimate Std. Error t value  Pr(>|t|)    
## (Intercept) 35.0451902  0.3933902  89.085 < 2.2e-16 ***
## t            0.0698107  0.0063528  10.989 < 2.2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.366 on 97 degrees of freedom
## Multiple R-squared:  0.5546 ,  Adjusted R-squared:  0.55
## F-statistic: 120.8 on 1 and 97 DF,  p-value: < 9.97e-19
## 
## Durbin-Watson statistic 
## (original):    0.39519 , p-value: 1.096e-16
## (transformed): 1.49316 , p-value: 3.732e-03
# therefore, there is deterministic that justifies fitting a signal-plus-noise model instead of an ARMA(p,q). However, ARMA(p,q)
# will be fitted later for comparison.

#x.z = x - fit$coefficients[1] - fit$coefficients[2]*t # derive residuals
x.z = fit$residuals # derive residuals (from the regression line)
ar.z = aic.wge(x.z, p=0:6, type="aic") # find a model to use for approximating the residuals. NOTE: (sigmaHAT_a)^2 = 0.1177843
# ar.z$p is the order p (aic selects p=2 where q=0, as does the bic)

# Transform the stock prices by the autoregressive coefficients of the fitted residuals from the linear regression model. 
# Remove phi from residuals to remove serial correlation and allow us to model
y.trans = artrans.wge(df$low, phi.tr=ar.z$phi)
Signal-Plus-Noise Model

Signal-Plus-Noise Model

# also, transform the predictor variable (time) by the autogregressive coefficeints of the fitted residuals as well
t.trans  = artrans.wge(t, phi.tr=ar.z$phi)
Signal-Plus-Noise Model

Signal-Plus-Noise Model

# Finally, regress the newly transformed stock prices (Y-HAT_t) on the transformed time (T-HAT_t)using ordinary least squares
fitco = lm(y.trans ~ t.trans)
summary(fitco) # check to see if the transformed beta coefficient for the slope is still significant
## 
## Call:
## lm(formula = y.trans ~ t.trans)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -0.97241 -0.18510  0.01497  0.20566  0.70938 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept) 9.320641   0.074125  125.74   <2e-16 ***
## t.trans     0.070037   0.004634   15.11   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.3452 on 96 degrees of freedom
## Multiple R-squared:  0.7041, Adjusted R-squared:  0.701 
## F-statistic: 228.4 on 1 and 96 DF,  p-value: < 2.2e-16
# Evaluating the residuals after Cochrane-Orcutt:
plotts.wge(fitco$residuals)
Signal-Plus-Noise Model

Signal-Plus-Noise Model

acf(fitco$residuals) # residuals appear to be white noise
Signal-Plus-Noise Model

Signal-Plus-Noise Model

ljung.wge(fitco$residuals) # there is not enough evidence based on the ljung-box test to reject the null hypothesis. Therefore, 
## Obs -0.02329353 0.03696201 -0.0148023 -0.1264051 0.02979692 0.1170724 0.0121294 -0.04358782 -0.004767952 -0.09739137 0.04153028 -0.05785741 0.01704484 -0.08708388 -0.04190725 -0.00537386 0.008676478 -0.04774231 -0.123505 0.02173201 -0.01302386 -0.09454542 -0.1174163 -0.0818482
## $test
## [1] "Ljung-Box test"
## 
## $K
## [1] 24
## 
## $chi.square
## [1] 12.52529
## 
## $df
## [1] 24
## 
## $pval
## [1] 0.9733174
# we cannot assume that the residuals are not white noise.

# Final Signal-Plus-Noise Model: X_t = 34.855438 + 0.072922*t + Z_t, (sigmaHAT_a)^2 = 0.1177843 summary(fit = lm(x ~ t))
# creates the coefficients
ar.z$phi
## [1]  1.0533533 -0.3193699
# (1 - 1.0533533*B + 0.3193699*B^2)*Z_t = a__t. ar.z$phi from AR(2) creates the coeffients and (sigmaHAT_a)^2 = 0.1177843

# BUT, TO REITERATE: Final Signal-Plus-Noise Model is X_t = 34.855438 + 0.072922*t + Z_t, (sigmaHAT_a)^2 = 0.1177843
est_mod = gen.sigplusnoise.wge(100, b0=34.855438, b1 = 0.072922, phi=ar.z$phi, vara=0.1177843)
Signal-Plus-Noise Model

Signal-Plus-Noise Model

Signal-Plus-Noise Model

Signal-Plus-Noise Model

plotts.sample.wge(est_mod)
Signal-Plus-Noise Model

Signal-Plus-Noise Model

## $autplt
##  [1] 1.0000000 0.9649655 0.9334719 0.8986275 0.8595332 0.8221513 0.7876933
##  [8] 0.7459324 0.7166321 0.6894377 0.6560116 0.6220197 0.5859223 0.5509444
## [15] 0.5253477 0.4922292 0.4611314 0.4333708 0.4073432 0.3874729 0.3672227
## [22] 0.3466789 0.3296429 0.3128115 0.2965247 0.2860062
## 
## $freq
##  [1] 0.01 0.02 0.03 0.04 0.05 0.06 0.07 0.08 0.09 0.10 0.11 0.12 0.13 0.14 0.15
## [16] 0.16 0.17 0.18 0.19 0.20 0.21 0.22 0.23 0.24 0.25 0.26 0.27 0.28 0.29 0.30
## [31] 0.31 0.32 0.33 0.34 0.35 0.36 0.37 0.38 0.39 0.40 0.41 0.42 0.43 0.44 0.45
## [46] 0.46 0.47 0.48 0.49 0.50
## 
## $db
##  [1]  14.7525725   7.4233097   8.1155006   4.0329426  -1.6209091  -0.6176912
##  [7]  -4.7737109  -0.7161254  -3.6425942  -7.1455329  -5.1975118  -7.7020639
## [13] -10.0875143 -10.6617690 -12.7619963 -11.8577599 -14.4675184 -13.6239735
## [19] -13.5212229  -6.4339982 -14.6141046  -9.6812205 -19.3580318 -14.4146666
## [25] -14.3590638 -22.9415694 -25.0874711 -10.9702227 -22.2216967 -19.3419107
## [31] -15.2091087 -11.4984355 -13.4616939 -14.2089079  -9.3625433  -9.9840964
## [37] -17.3619606 -25.4304625 -15.3709873 -17.6853924 -18.1427148 -15.3960836
## [43] -13.1133405 -16.5083534 -15.9430166 -19.4005299 -15.1912996 -13.4934606
## [49] -11.5578654 -17.5556580
## 
## $dbz
##  [1]  10.754638  10.042620   8.849643   7.173591   5.033192   2.509234
##  [7]  -0.182278  -2.657956  -4.586863  -6.006233  -7.143541  -8.096754
## [13]  -8.887202  -9.613298 -10.400862 -11.251529 -11.999172 -12.463399
## [19] -12.669235 -12.815423 -13.069482 -13.484916 -14.040176 -14.688570
## [25] -15.365573 -15.964868 -16.348479 -16.431500 -16.248806 -15.903508
## [31] -15.487049 -15.071216 -14.734013 -14.560425 -14.619282 -14.941545
## [37] -15.507757 -16.239307 -16.995876 -17.602315 -17.927011 -17.961229
## [43] -17.801900 -17.558560 -17.291010 -17.012529 -16.727715 -16.464280
## [49] -16.273186 -16.202959
plotts.sample.wge(df$low) # the estimated model (above) matches to the actual model (here) on both sample ACFs and sample spectral
Signal-Plus-Noise Model

Signal-Plus-Noise Model

## $autplt
##  [1] 1.0000000 0.9542096 0.8987323 0.8524345 0.8100849 0.7813310 0.7600881
##  [8] 0.7371705 0.7123568 0.6828689 0.6476693 0.6153474 0.5864842 0.5600570
## [15] 0.5282542 0.4955941 0.4660246 0.4343222 0.4026243 0.3675647 0.3278754
## [22] 0.2900614 0.2568547 0.2354177 0.2239847 0.2178967
## 
## $freq
##  [1] 0.01 0.02 0.03 0.04 0.05 0.06 0.07 0.08 0.09 0.10 0.11 0.12 0.13 0.14 0.15
## [16] 0.16 0.17 0.18 0.19 0.20 0.21 0.22 0.23 0.24 0.25 0.26 0.27 0.28 0.29 0.30
## [31] 0.31 0.32 0.33 0.34 0.35 0.36 0.37 0.38 0.39 0.40 0.41 0.42 0.43 0.44 0.45
## [46] 0.46 0.47 0.48 0.49 0.50
## 
## $db
##  [1]  14.2537148   9.9541873   5.8173014  -0.5437278  -0.8235314  -1.3564258
##  [7]   1.4197169  -1.1176687  -7.3533813  -3.8782332  -0.5445071  -3.3199576
## [13]  -5.9250727  -5.4172047  -2.9311702 -11.1750831  -6.3519213 -18.5926329
## [19] -18.7079883  -8.4217804  -9.3402655  -9.6710352  -9.7031584  -8.2549030
## [25] -14.1830462 -11.3498316 -12.9743644 -12.1531178 -16.0829501  -9.7837888
## [31] -12.9010407  -9.5737077 -12.8850744 -15.0135304 -10.3417581 -16.6716427
## [37] -13.1487825 -11.0083985 -16.2745553 -18.6304870 -17.7423184 -13.7665082
## [43] -16.0810139 -16.9046379 -13.1224062 -20.5533807 -18.2518162 -14.7903000
## [49] -19.2395748 -19.0265205
## 
## $dbz
##  [1]  10.6247182   9.9050329   8.7000676   7.0111987   4.8698055   2.3951954
##  [7]  -0.1075597  -2.1509930  -3.4458827  -4.1937204  -4.7075065  -5.1324715
## [13]  -5.5423563  -6.0346253  -6.7027509  -7.5765092  -8.5979157  -9.6409675
## [19] -10.5694523 -11.2993670 -11.8169224 -12.1628463 -12.4196726 -12.6865263
## [25] -13.0293947 -13.4394270 -13.8345951 -14.1178754 -14.2557593 -14.2960188
## [31] -14.3127453 -14.3564021 -14.4497949 -14.6054534 -14.8345314 -15.1426627
## [37] -15.5243385 -15.9631650 -16.4335783 -16.8977220 -17.3041080 -17.6055003
## [43] -17.7946958 -17.9201902 -18.0549021 -18.2464892 -18.4904595 -18.7368379
## [49] -18.9191710 -18.9861139
# density as well as on partial ACF (below):
pacf(est_mod)
pacf(df$low)

#########################################################################################################
############### Confirmation for repeated model visualization of ACF and Spectral Density ###############
#########################################################################################################
for( i in 1: 5)
{
   SpecDen2 = parzen.wge(gen.sigplusnoise.wge(100, b0=34.855438, b1 = 0.072922, phi=ar.z$phi, vara=0.1177843), plot = T)
   lines(SpecDen2$freq,SpecDen2$pzgram, lwd = 2, col = "red")
}
Signal-Plus-Noise Model

Signal-Plus-Noise Model

Signal-Plus-Noise Model

Signal-Plus-Noise Model

Signal-Plus-Noise Model

Signal-Plus-Noise Model

Signal-Plus-Noise Model

Signal-Plus-Noise Model

Signal-Plus-Noise Model

Signal-Plus-Noise Model

Signal-Plus-Noise Model

Signal-Plus-Noise Model

Signal-Plus-Noise Model

Signal-Plus-Noise Model

Signal-Plus-Noise Model

Signal-Plus-Noise Model

Signal-Plus-Noise Model

Signal-Plus-Noise Model

Signal-Plus-Noise Model

Signal-Plus-Noise Model

Signal-Plus-Noise Model

Signal-Plus-Noise Model

Signal-Plus-Noise Model

Signal-Plus-Noise Model

Signal-Plus-Noise Model

Signal-Plus-Noise Model

Signal-Plus-Noise Model

Signal-Plus-Noise Model

Signal-Plus-Noise Model

Signal-Plus-Noise Model

Signal-Plus-Noise Model

Signal-Plus-Noise Model

for( i in 1: 5)
{
   ACF2 = acf(gen.sigplusnoise.wge(100, b0=34.855438, b1 = 0.072922, phi=ar.z$phi, vara=0.1177843), plot = T)
   lines(ACF2$lag ,ACF2$acf, lwd = 2, col = "red")
}
Signal-Plus-Noise Model

Signal-Plus-Noise Model

Signal-Plus-Noise Model

Signal-Plus-Noise Model

Signal-Plus-Noise Model

Signal-Plus-Noise Model

Signal-Plus-Noise Model

Signal-Plus-Noise Model

Signal-Plus-Noise Model

Signal-Plus-Noise Model

Signal-Plus-Noise Model

Signal-Plus-Noise Model

Signal-Plus-Noise Model

Signal-Plus-Noise Model

Signal-Plus-Noise Model

Signal-Plus-Noise Model

Signal-Plus-Noise Model

Signal-Plus-Noise Model

Signal-Plus-Noise Model

Signal-Plus-Noise Model

Signal-Plus-Noise Model

Signal-Plus-Noise Model

Signal-Plus-Noise Model

Signal-Plus-Noise Model

Signal-Plus-Noise Model

Signal-Plus-Noise Model

Signal-Plus-Noise Model

Signal-Plus-Noise Model

Signal-Plus-Noise Model

Signal-Plus-Noise Model

#########################################################################################################
#########################################################################################################
#########################################################################################################

signoise.forecast <- fore.sigplusnoise.wge(df$low, max.p=2, n.ahead=5, limits=T, lastn=T, plot=T)
## 
## Coefficients of Original polynomial:  
## 1.0507 -0.3152 
## 
## Factor                 Roots                Abs Recip    System Freq 
## 1-1.0507B+0.3152B^2    1.6667+-0.6283i      0.5614       0.0574
##   
## 
Signal-Plus-Noise Model

Signal-Plus-Noise Model

SIGNOISE_ASE = mean((df$low[(nrow(df)-5+1):nrow(df)] - signoise.forecast$f)^2)
SIGNOISE_ASE # 0.133713
## [1] 0.133713

ARIMA(p,d,q) Model

In the ARIMA model, we selected a forecast horizon of five trading days because this timeframe completes a full business week. Models can be fully re-developed on non-trading days. However, unless there are 2 unit roots, ARIMA will not forecast a trend to continue. Therefore, the forecast converge back toward the mean.

visual inspection of the spectral density estimate in the Parzen Window and the sample autocorrelations, it is apparent the data are wandering. Therefore, we decided to difference the data to add stationarity into the model.

df <- stock_data()
## Parsed with column specification:
## cols(
##   times = col_date(format = ""),
##   open = col_double(),
##   high = col_double(),
##   low = col_double(),
##   close = col_double(),
##   volume = col_double(),
##   symbol = col_character(),
##   market = col_character()
## )
# take a sample of the data, analyze
# plotts.sample.wge(df$low, arlimits=T)

###########################################################
########################################################### ARMA/ARIMA
###########################################################

factor.wge(df$low) # many unit roots in the data
## 
## Coefficients of Original polynomial:  
## 34.2300 33.9800 34.3624 34.5350 34.7800 35.4000 35.4984 35.3650 34.8500 34.9200 34.8900 34.9600 35.4300 35.4900 35.7200 36.0700 36.1800 36.0977 35.8550 35.2100 35.5450 36.3550 37.0200 37.5800 37.7600 37.6800 37.7700 37.9300 38.3300 38.0200 38.2100 38.1785 38.3000 38.1100 37.8100 37.5500 37.3100 37.1500 37.3000 37.6100 37.6500 38.0200 38.2400 38.4200 38.4100 38.3200 37.5700 37.7600 38.1900 39.1200 39.1800 38.8300 38.8350 38.2700 38.3400 38.9400 39.8000 39.7300 39.5600 39.3800 38.4300 38.6700 38.8300 38.8500 39.2800 39.4300 39.3000 40.1300 41.0000 41.1100 41.0900 40.3000 40.3500 40.4000 40.1100 40.2700 40.4400 40.5200 40.9600 40.7200 40.9200 41.0700 41.2300 41.8900 41.8300 41.9000 41.3300 40.6700 40.5200 41.2400 41.6000 40.7400 40.7500 41.0100 41.1900 41.2700 41.7500 41.2100 41.8400 41.7450 
## 
## Factor                 Roots                Abs Recip    System Freq 
## 1-35.2232B              0.0284               35.2232       0.0000
## 1-1.5436B+1.0067B^2    0.7667+-0.6368i      1.0033       0.1103
## 1-1.1754B+1.0060B^2    0.5842+-0.8079i      1.0030       0.1504
## 1-0.9656B+1.0055B^2    0.4802+-0.8741i      1.0027       0.1700
## 1-0.1240B+1.0053B^2    0.0617+-0.9954i      1.0027       0.2401
## 1+0.8525B+1.0052B^2   -0.4240+-0.9028i      1.0026       0.3199
## 1+0.6193B+1.0052B^2   -0.3081+-0.9486i      1.0026       0.3000
## 1-1.8153B+1.0052B^2    0.9030+-0.4236i      1.0026       0.0698
## 1+1.4615B+1.0050B^2   -0.7272+-0.6829i      1.0025       0.3800
## 1+1.1803B+1.0049B^2   -0.5872+-0.8064i      1.0025       0.3502
## 1-1.4636B+1.0046B^2    0.7284+-0.6817i      1.0023       0.1197
## 1-0.6200B+1.0046B^2    0.3086+-0.9488i      1.0023       0.2000
## 1-1.3722B+1.0044B^2    0.6831+-0.7273i      1.0022       0.1300
## 1-1.9884B+1.0044B^2    0.9899+-0.1258i      1.0022       0.0201
## 1-1.6208B+1.0044B^2    0.8069+-0.5870i      1.0022       0.1001
## 1-0.2510B+1.0044B^2    0.1249+-0.9900i      1.0022       0.2300
## 1-0.4977B+1.0043B^2    0.2478+-0.9666i      1.0022       0.2101
## 1-1.9688B+1.0043B^2    0.9802+-0.1870i      1.0022       0.0300
## 1-1.7551B+1.0043B^2    0.8738+-0.4818i      1.0021       0.0802
## 1-1.2792B+1.0042B^2    0.6369+-0.7682i      1.0021       0.1398
## 1-0.3763B+1.0042B^2    0.1873+-0.9802i      1.0021       0.2199
## 1+0.1270B+1.0040B^2   -0.0633+-0.9960i      1.0020       0.2601
## 1+0.3752B+1.0037B^2   -0.1869+-0.9805i      1.0019       0.2800
## 1-1.8631B+1.0037B^2    0.9281+-0.3674i      1.0019       0.0600
## 1+1.3714B+1.0037B^2   -0.6832+-0.7277i      1.0018       0.3700
## 1-1.9997B+1.0037B^2    0.9962+-0.0630i      1.0018       0.0100
## 1+0.9658B+1.0036B^2   -0.4812+-0.8746i      1.0018       0.3301
## 1+0.7371B+1.0034B^2   -0.3673+-0.9283i      1.0017       0.3100
## 1+0.2521B+1.0033B^2   -0.1257+-0.9904i      1.0016       0.2701
## 1+1.9060B+1.0032B^2   -0.9499+-0.3072i      1.0016       0.4502
## 1+1.7548B+1.0032B^2   -0.8746+-0.4816i      1.0016       0.4199
## 1+1.9874B+1.0032B^2   -0.9906+-0.1250i      1.0016       0.4800
## 1-1.9042B+1.0028B^2    0.9494+-0.3095i      1.0014       0.0501
## 1-1.0736B+1.0028B^2    0.5353+-0.8430i      1.0014       0.1600
## 1+1.0729B+1.0026B^2   -0.5351+-0.8433i      1.0013       0.3400
## 1-1.9398B+1.0025B^2    0.9675+-0.2479i      1.0012       0.0399
## 1-1.6900B+1.0025B^2    0.8429+-0.5357i      1.0012       0.0901
## 1+0.0017B+1.0024B^2   -8e-04+-0.9988i      1.0012       0.2501
## 1+1.8123B+1.0022B^2   -0.9041+-0.4247i      1.0011       0.4301
## 1+1.5436B+1.0022B^2   -0.7701+-0.6362i      1.0011       0.3901
## 1+1.8617B+1.0021B^2   -0.9289+-0.3675i      1.0011       0.4400
## 1+1.2774B+1.0018B^2   -0.6375+-0.7692i      1.0009       0.3601
## 1+1.6906B+1.0018B^2   -0.8438+-0.5350i      1.0009       0.4101
## 1+0.4969B+1.0018B^2   -0.2480+-0.9678i      1.0009       0.2899
## 1+1.6195B+1.0016B^2   -0.8085+-0.5872i      1.0008       0.4000
## 1+1.0008B             -0.9992               1.0008       0.5000
## 1+1.9975B+1.0015B^2   -0.9973+-0.0628i      1.0007       0.4900
## 1+1.9662B+1.0014B^2   -0.9817+-0.1866i      1.0007       0.4701
## 1+1.9383B+1.0011B^2   -0.9681+-0.2483i      1.0005       0.4600
## 1-0.7369B+1.0009B^2    0.3681+-0.9293i      1.0004       0.1900
## 1-0.8524B+0.9998B^2    0.4263+-0.9047i      0.9999       0.1799
##   
## 
aic5.wge(df$low, type="aic")
## ---------WORKING... PLEASE WAIT... 
## 
## 
## Five Smallest Values of  aic
##       p    q        aic
## 16    5    0  -1.910377
## 5     1    1  -1.897324
## 7     2    0  -1.893150
## 17    5    1  -1.891381
## 13    4    0  -1.888234
est.arma.wge(df$low, p=5, q=0) # the factor table for df$low provides one (1-B) represented as (1-0.9956B), a close-enough 
## 
## Coefficients of Original polynomial:  
## 1.1877 -0.2735 -0.0237 -0.1048 0.2089 
## 
## Factor                 Roots                Abs Recip    System Freq 
## 1-0.9956B              1.0044               0.9956       0.0000
## 1-1.0272B+0.5751B^2    0.8930+-0.9702i      0.7584       0.1316
## 1+0.8350B+0.3648B^2   -1.1444+-1.1964i      0.6040       0.3715
##   
## 
## $phi
## [1]  1.1877481 -0.2735138 -0.0236647 -0.1047691  0.2088935
## 
## $theta
## [1] 0
## 
## $res
##   [1] -0.165360679 -0.248592128  0.363885549  0.015565862  0.159446632
##   [6]  0.554243110  0.079621732 -0.057074013 -0.382429141  0.278880923
##  [11] -0.097482597 -0.019422817  0.334796257 -0.030095363  0.241082142
##  [16]  0.359033945  0.152269066 -0.051404314 -0.146420209 -0.534437138
##  [21]  0.436742881  0.635086671  0.406138563  0.388882305  0.274632352
##  [26]  0.164626052  0.312598317  0.267836037  0.402397484 -0.382792087
##  [31]  0.314742389 -0.019790513  0.192239782 -0.262226805 -0.219405077
##  [36] -0.215162983 -0.213589314 -0.212029435  0.064473180  0.182297318
##  [41] -0.079496154  0.364704285  0.212652401  0.234638887  0.019206979
##  [46]  0.025931985 -0.669887571  0.358972500  0.317386679  0.763533802
##  [51] -0.218741112 -0.198886132  0.265605639 -0.392032019  0.154145806
##  [56]  0.467383695  0.694146794 -0.291790777 -0.003869088  0.067492657
##  [61] -0.752101252  0.376021762 -0.016323724 -0.126548116  0.267208878
##  [66]  0.139327188 -0.084122195  0.920160373  0.813195166  0.039685491
##  [71]  0.101677889 -0.499777879  0.403443472  0.007294153 -0.372186338
##  [76]  0.068529859  0.140618696  0.050394737  0.404830882 -0.254532034
##  [81]  0.337154539  0.167243706  0.227491623  0.626154227 -0.099359380
##  [86]  0.200147619 -0.468357850 -0.397890716 -0.062379121  0.661642566
##  [91]  0.035477245 -1.008810154  0.260311114  0.388498810  0.129381352
##  [96] -0.098366118  0.522695415 -0.536131917  0.832977988 -0.175861021
## 
## $avar
## [1] 0.131286
## 
## $aic
## [1] -1.910377
## 
## $aicc
## [1] -0.878203
## 
## $bic
## [1] -1.754067
## 
## $se.phi
## [1] 0.009628394 0.024824361 0.027846609 0.027616707 0.010504271
## 
## $se.theta
## [1] 0
# approximation. Therefore, we difference the data once. Preliminary evidence suggesting differencing is useful is based 
# on the specified wandering and the damping sample autocorrelations. When using an overfit table, there was a factor close
# to (1-b)^2, but it was not very significant (third most significant using estimated_arma <- est.arma.wge(dftrans, p=15, q=1))
# so we decided to only use a first difference.

dftrans <- artrans.wge(df$low, phi.tr=1)
ARIMA(p,d,q) Model

ARIMA(p,d,q) Model

aic5.wge(dftrans, type="aic") # aic = -2.001944
## ---------WORKING... PLEASE WAIT... 
## 
## 
## Five Smallest Values of  aic
##       p    q        aic
## 17    5    1  -2.001944
## 13    4    0  -1.985397
## 14    4    1  -1.966194
## 16    5    0  -1.966140
## 10    3    0  -1.934893
estimated_arma <- est.arma.wge(dftrans, p=5, q=0)
## 
## Coefficients of Original polynomial:  
## 0.1227 -0.1183 -0.1454 -0.2587 -0.0292 
## 
## Factor                 Roots                Abs Recip    System Freq 
## 1-1.0644B+0.6446B^2    0.8257+-0.9325i      0.8028       0.1347
## 1+0.8220B+0.3777B^2   -1.0882+-1.2098i      0.6146       0.3666
## 1+0.1198B             -8.3479               0.1198       0.5000
##   
## 
estimated_arma$phi
## [1]  0.12269667 -0.11827203 -0.14543411 -0.25874556 -0.02916165
#estimated_arma <- est.arma.wge(dftrans, p=15, q=1)
ljung.wge(estimated_arma$res, p=5, q=0) # Indication there is no serial correlation in the residuals of the model; this is a good fit.
## Obs -0.001755173 -0.0006645789 -0.03189077 -0.0558602 -0.03745613 -0.006650248 -0.07239689 -0.1447362 -0.01711384 -0.08391131 0.05274902 -0.02483137 0.045087 -0.06707539 -0.004246384 0.07628534 0.08560867 -0.0009574519 -0.1290798 0.02213024 0.01599169 -0.07501175 -0.1093225 -0.02137378
## $test
## [1] "Ljung-Box test"
## 
## $K
## [1] 24
## 
## $chi.square
## [1] 11.57463
## 
## $df
## [1] 19
## 
## $pval
## [1] 0.9029941
estimated_arma$avar # 0.1240151
## [1] 0.1240151
mean(df$low) # 38.53802
## [1] 38.53802
# (1-B)(1-0.12269667B+0.11827203B^2+0.14543411B^3+0.25874556B^4-0.02916165B^5)*(X_t + 38.53802) = a_t, (sigma-hat_a)^2 = 0.1240151
# (1-1.12269667B+0.00442464B^2+0.26370614B^3+0.11331145B^4-0.28790721B^5+0.02916165B^6)*(X_t + 38.53802) = a_t, (sigma-hat_a)^2 = 0.1240151

######
###### Plotting residuals

# the residuals of the model do not appear correlated. The sample autocorrelation and partial autocorrelation plots indicate 
# stationarity across all lags with all lags measured within the 5% limits
plotts.sample.wge(estimated_arma$res)
ARIMA(p,d,q) Model

ARIMA(p,d,q) Model

## $autplt
##  [1]  1.0000000000 -0.0017551730 -0.0006645789 -0.0318907670 -0.0558601955
##  [6] -0.0374561274 -0.0066502482 -0.0723968930 -0.1447362269 -0.0171138352
## [11] -0.0839113138  0.0527490212 -0.0248313665  0.0450869990 -0.0670753909
## [16] -0.0042463842  0.0762853413  0.0856086721 -0.0009574519 -0.1290798025
## [21]  0.0221302437  0.0159916891 -0.0750117483 -0.1093225110 -0.0213737804
## [26]  0.0330170662
## 
## $freq
##  [1] 0.01010101 0.02020202 0.03030303 0.04040404 0.05050505 0.06060606
##  [7] 0.07070707 0.08080808 0.09090909 0.10101010 0.11111111 0.12121212
## [13] 0.13131313 0.14141414 0.15151515 0.16161616 0.17171717 0.18181818
## [19] 0.19191919 0.20202020 0.21212121 0.22222222 0.23232323 0.24242424
## [25] 0.25252525 0.26262626 0.27272727 0.28282828 0.29292929 0.30303030
## [31] 0.31313131 0.32323232 0.33333333 0.34343434 0.35353535 0.36363636
## [37] 0.37373737 0.38383838 0.39393939 0.40404040 0.41414141 0.42424242
## [43] 0.43434343 0.44444444 0.45454545 0.46464646 0.47474747 0.48484848
## [49] 0.49494949
## 
## $db
##  [1]  -8.833455090   0.263787556 -11.498115493   2.117874832   2.272952545
##  [6]  -9.309718287   5.018377394   0.289213348  -1.892315270 -11.308615271
## [11]   3.725194391  -0.611197010 -12.632295451  -0.590314051   3.046760336
## [16] -12.371079560   0.004883422   2.510605713   4.337557356  -3.178637232
## [21] -17.271334056  -1.077447786  -2.257551049   3.231804970  -1.965461358
## [26]  -3.043185676  -1.825761971  -7.420727112   2.312103352  -1.619957827
## [31]  -7.841066943   5.001502888  -2.103213653   0.521620232   2.860045258
## [36]  -2.911236603  -9.089689687   4.807535654  -1.618776046 -13.746143879
## [41]  -6.930503276   3.442666207  -8.907109791  -0.989371943   3.916436743
## [46] -14.309350664   0.883844166   0.342804251  -6.422777409
## 
## $dbz
##  [1] -2.1421139616 -1.5268677158 -0.7764179465 -0.0840477881  0.4450188370
##  [6]  0.7709948539  0.8934228747  0.8371033041  0.6469486934  0.3863575660
## [11]  0.1329634098 -0.0354873769 -0.0674530562  0.0372285907  0.2260959001
## [16]  0.4205848653  0.5490294530  0.5673324474  0.4653779820  0.2642966345
## [21]  0.0072262109 -0.2555822359 -0.4825978402 -0.6484742078 -0.7390119560
## [26] -0.7427274437 -0.6522279590 -0.4757583702 -0.2446363728 -0.0055391248
## [31]  0.1965782737  0.3316246748  0.3873844148  0.3649753175  0.2727409570
## [36]  0.1240460268 -0.0586583169 -0.2366934318 -0.3546826477 -0.3582603902
## [41] -0.2277399300 -0.0007442644  0.2413219942  0.4108729200  0.4464568416
## [46]  0.3298807324  0.0947313908 -0.1721868052 -0.3517455314
par(mfrow=c(3,1))
plot(estimated_arma$res, ylab="ARIMA Residuals", xlab = "Trading Days", main = "ARIMA(5,1,1) Model Residuals over Time", lwd=2)
abline(h=mean(estimated_arma$res), col="blue")
acf(estimated_arma$res) # this seems to be white noise; all lags are within the 5% limits
pacf(estimated_arma$res) # this seems to be white noise; all lags are within the 5% limits
ARIMA(p,d,q) Model

ARIMA(p,d,q) Model

aic5.wge(estimted_arma$res, type="aic") # White noise is the top model selected for the residuals, by AIC (ARMA(0,0))
## ---------WORKING... PLEASE WAIT... 
## 
## 
## Error in aic calculation at 0 0 
## Error in aic calculation at 0 1 
## Error in aic calculation at 0 2 
## Error in aic calculation at 1 0 
## Error in aic calculation at 1 1 
## Error in aic calculation at 1 2 
## Error in aic calculation at 2 0 
## Error in aic calculation at 2 1 
## Error in aic calculation at 2 2 
## Error in aic calculation at 3 0 
## Error in aic calculation at 3 1 
## Error in aic calculation at 3 2 
## Error in aic calculation at 4 0 
## Error in aic calculation at 4 1 
## Error in aic calculation at 4 2 
## Error in aic calculation at 5 0 
## Error in aic calculation at 5 1 
## Error in aic calculation at 5 2 
## Five Smallest Values of  aic
##      p    q        aic
## 1    0    0     999999
## 2    0    1     999999
## 3    0    2     999999
## 4    1    0     999999
## 5    1    1     999999
######
######

######
###### Compare estimated model to differenced data

# We generated a model of 99 data points using the ARMA(5,1) identified by aic5 with a variance equal to that estimated from the
# differenced data, which is 0.1172604 (select to see origin highlighted above)
est_mod <- gen.aruma.wge(99, phi=estimated_arma$phi, theta=estimated_arma$theta, d=1, vara=estimated_arma$avar, sn=40)
ARIMA(p,d,q) Model

ARIMA(p,d,q) Model

plotts.sample.wge(est_mod, arlimits=T) # estimated model
ARIMA(p,d,q) Model

ARIMA(p,d,q) Model

## $autplt
##  [1] 1.0000000 0.9319568 0.8616508 0.8115232 0.7837261 0.7798462 0.7686385
##  [8] 0.7540206 0.7238039 0.6950334 0.6636351 0.6243891 0.5993385 0.5769297
## [15] 0.5519124 0.5366042 0.5296241 0.5194643 0.4818245 0.4413453 0.3916907
## [22] 0.3539872 0.3181363 0.2818305 0.2584268 0.2245354
## 
## $freq
##  [1] 0.01010101 0.02020202 0.03030303 0.04040404 0.05050505 0.06060606
##  [7] 0.07070707 0.08080808 0.09090909 0.10101010 0.11111111 0.12121212
## [13] 0.13131313 0.14141414 0.15151515 0.16161616 0.17171717 0.18181818
## [19] 0.19191919 0.20202020 0.21212121 0.22222222 0.23232323 0.24242424
## [25] 0.25252525 0.26262626 0.27272727 0.28282828 0.29292929 0.30303030
## [31] 0.31313131 0.32323232 0.33333333 0.34343434 0.35353535 0.36363636
## [37] 0.37373737 0.38383838 0.39393939 0.40404040 0.41414141 0.42424242
## [43] 0.43434343 0.44444444 0.45454545 0.46464646 0.47474747 0.48484848
## [49] 0.49494949
## 
## $db
##  [1]  15.4662467   6.4612009   2.8156959  -6.5831271  -0.3491412   0.6467012
##  [7] -23.4462233 -11.9860652  -4.2541616  -2.3339334  -4.9973000  -0.5165929
## [13]  -8.4672734  -5.9260513 -12.5070338  -1.5329287  -9.4007029  -4.2193607
## [19] -12.8919484 -17.0150048  -6.9836378 -14.0339519  -5.8623391 -15.3945590
## [25] -11.2233206 -24.0796617 -10.8749229 -18.3577480 -11.7127985  -7.5363064
## [31] -14.6647063 -14.4603439 -11.9075717 -18.9642429 -19.7115192 -13.9330289
## [37] -14.7859997 -12.3169328 -15.2897638 -15.2611997  -7.5246856 -25.4508518
## [43] -13.1571835 -18.5429171 -14.5183414 -11.9244393 -15.4696702 -20.2139032
## [49] -16.6394603
## 
## $dbz
##  [1]  10.3894114   9.7014613   8.5461122   6.9166900   4.8252742   2.3519955
##  [7]  -0.2434927  -2.4200960  -3.6721480  -4.1238920  -4.2332485  -4.2725076
## [13]  -4.3507834  -4.5391300  -4.8913942  -5.4201345  -6.0871771  -6.8234956
## [19]  -7.5683596  -8.2978154  -9.0157797  -9.7227007 -10.4014424 -11.0315750
## [25] -11.6029838 -12.1026903 -12.4960138 -12.7466330 -12.8695657 -12.9469286
## [31] -13.0762896 -13.3067957 -13.6115217 -13.9016587 -14.0770531 -14.0895034
## [37] -13.9687564 -13.7936668 -13.6492815 -13.6034164 -13.6994736 -13.9536233
## [43] -14.3531696 -14.8590436 -15.4153465 -15.9631305 -16.4485233 -16.8195565
## [49] -17.0238291
plotts.sample.wge(df$low, arlimits=T) # estimated model
ARIMA(p,d,q) Model

ARIMA(p,d,q) Model

## $autplt
##  [1] 1.0000000 0.9542096 0.8987323 0.8524345 0.8100849 0.7813310 0.7600881
##  [8] 0.7371705 0.7123568 0.6828689 0.6476693 0.6153474 0.5864842 0.5600570
## [15] 0.5282542 0.4955941 0.4660246 0.4343222 0.4026243 0.3675647 0.3278754
## [22] 0.2900614 0.2568547 0.2354177 0.2239847 0.2178967
## 
## $freq
##  [1] 0.01 0.02 0.03 0.04 0.05 0.06 0.07 0.08 0.09 0.10 0.11 0.12 0.13 0.14 0.15
## [16] 0.16 0.17 0.18 0.19 0.20 0.21 0.22 0.23 0.24 0.25 0.26 0.27 0.28 0.29 0.30
## [31] 0.31 0.32 0.33 0.34 0.35 0.36 0.37 0.38 0.39 0.40 0.41 0.42 0.43 0.44 0.45
## [46] 0.46 0.47 0.48 0.49 0.50
## 
## $db
##  [1]  14.2537148   9.9541873   5.8173014  -0.5437278  -0.8235314  -1.3564258
##  [7]   1.4197169  -1.1176687  -7.3533813  -3.8782332  -0.5445071  -3.3199576
## [13]  -5.9250727  -5.4172047  -2.9311702 -11.1750831  -6.3519213 -18.5926329
## [19] -18.7079883  -8.4217804  -9.3402655  -9.6710352  -9.7031584  -8.2549030
## [25] -14.1830462 -11.3498316 -12.9743644 -12.1531178 -16.0829501  -9.7837888
## [31] -12.9010407  -9.5737077 -12.8850744 -15.0135304 -10.3417581 -16.6716427
## [37] -13.1487825 -11.0083985 -16.2745553 -18.6304870 -17.7423184 -13.7665082
## [43] -16.0810139 -16.9046379 -13.1224062 -20.5533807 -18.2518162 -14.7903000
## [49] -19.2395748 -19.0265205
## 
## $dbz
##  [1]  10.6247182   9.9050329   8.7000676   7.0111987   4.8698055   2.3951954
##  [7]  -0.1075597  -2.1509930  -3.4458827  -4.1937204  -4.7075065  -5.1324715
## [13]  -5.5423563  -6.0346253  -6.7027509  -7.5765092  -8.5979157  -9.6409675
## [19] -10.5694523 -11.2993670 -11.8169224 -12.1628463 -12.4196726 -12.6865263
## [25] -13.0293947 -13.4394270 -13.8345951 -14.1178754 -14.2557593 -14.2960188
## [31] -14.3127453 -14.3564021 -14.4497949 -14.6054534 -14.8345314 -15.1426627
## [37] -15.5243385 -15.9631650 -16.4335783 -16.8977220 -17.3041080 -17.6055003
## [43] -17.7946958 -17.9201902 -18.0549021 -18.2464892 -18.4904595 -18.7368379
## [49] -18.9191710 -18.9861139
#########################################################################################################
############### Confirmation for repeated model visualization of ACF and Spectral Density ###############
#########################################################################################################
for( i in 1:5)
{
   SpecDen2 = parzen.wge(gen.aruma.wge(99, phi=estimated_arma$phi, theta=estimated_arma$theta, d=1, vara=estimated_arma$avar), plot = T)
   lines(SpecDen2$freq,SpecDen2$pzgram, lwd = 2, col = "red")
}
ARIMA(p,d,q) Model

ARIMA(p,d,q) Model

ARIMA(p,d,q) Model

ARIMA(p,d,q) Model

ARIMA(p,d,q) Model

ARIMA(p,d,q) Model

ARIMA(p,d,q) Model

ARIMA(p,d,q) Model

ARIMA(p,d,q) Model

ARIMA(p,d,q) Model

ARIMA(p,d,q) Model

ARIMA(p,d,q) Model

ARIMA(p,d,q) Model

ARIMA(p,d,q) Model

ARIMA(p,d,q) Model

ARIMA(p,d,q) Model

ARIMA(p,d,q) Model

ARIMA(p,d,q) Model

ARIMA(p,d,q) Model

ARIMA(p,d,q) Model

for( i in 1:5)
{
   ACF2 = acf(gen.aruma.wge(99, phi=estimated_arma$phi, theta=estimated_arma$theta, d=1, vara=estimated_arma$avar), plot = T)
   lines(ACF2$lag ,ACF2$acf, lwd = 2, col = "red")
}
ARIMA(p,d,q) Model

ARIMA(p,d,q) Model

ARIMA(p,d,q) Model

ARIMA(p,d,q) Model

ARIMA(p,d,q) Model

ARIMA(p,d,q) Model

ARIMA(p,d,q) Model

ARIMA(p,d,q) Model

ARIMA(p,d,q) Model

ARIMA(p,d,q) Model

ARIMA(p,d,q) Model

ARIMA(p,d,q) Model

ARIMA(p,d,q) Model

ARIMA(p,d,q) Model

ARIMA(p,d,q) Model

ARIMA(p,d,q) Model

ARIMA(p,d,q) Model

ARIMA(p,d,q) Model

ARIMA(p,d,q) Model

ARIMA(p,d,q) Model

#########################################################################################################
#########################################################################################################
#########################################################################################################

# after comparing sample spectral density and ACF plots, we comapred the AIC-estimated models to compare identified models
aic5.wge(dftrans)
## ---------WORKING... PLEASE WAIT... 
## 
## 
## Five Smallest Values of  aic
##       p    q        aic
## 17    5    1  -2.001944
## 13    4    0  -1.985397
## 14    4    1  -1.966194
## 16    5    0  -1.966140
## 10    3    0  -1.934893
aic5.wge(est_mod2) # ARMA(5,0) and ARMA(5,1) are top in both models, indicating a potentially useful model..
## ---------WORKING... PLEASE WAIT... 
## 
## 
## Error in aic calculation at 0 0 
## Error in aic calculation at 0 1 
## Error in aic calculation at 0 2 
## Error in aic calculation at 1 0 
## Error in aic calculation at 1 1 
## Error in aic calculation at 1 2 
## Error in aic calculation at 2 0 
## Error in aic calculation at 2 1 
## Error in aic calculation at 2 2 
## Error in aic calculation at 3 0 
## Error in aic calculation at 3 1 
## Error in aic calculation at 3 2 
## Error in aic calculation at 4 0 
## Error in aic calculation at 4 1 
## Error in aic calculation at 4 2 
## Error in aic calculation at 5 0 
## Error in aic calculation at 5 1 
## Error in aic calculation at 5 2 
## Five Smallest Values of  aic
##      p    q        aic
## 1    0    0     999999
## 2    0    1     999999
## 3    0    2     999999
## 4    1    0     999999
## 5    1    1     999999
######
######

######
###### Forecast and Measure Average Squared Errors (ASE)

# Model Forecast and ASE
# Forecasting the differenced data using the parameters estimated from it (we do not apply a d=1)
nonseasonal.forecast <- fore.aruma.wge(df$low, phi=estimated_arma$phi, theta=estimated_arma$theta, d=1,n.ahead=5, lastn=T)
ARIMA(p,d,q) Model

ARIMA(p,d,q) Model

ARIMA_ASE = mean((df$low[(nrow(df)-5+1):nrow(df)] - nonseasonal.forecast$f)^2)
ARIMA_ASE# 0.2026529
## [1] 0.2026529
######
######
df <- stock_data()
## Parsed with column specification:
## cols(
##   times = col_date(format = ""),
##   open = col_double(),
##   high = col_double(),
##   low = col_double(),
##   close = col_double(),
##   volume = col_double(),
##   symbol = col_character(),
##   market = col_character()
## )
# ccf(data.frame(df$volume, df$low, type="correlation")
ksfit <- lm(low ~ volume + HiLo + OpClo, data=df)

aic.wge(ksfit$residuals, p=0:10, q=0:5)  # AIC picks p=5 for ACGL; AIC picks p=2, q=1 for ACGL
## $type
## [1] "aic"
## 
## $value
## [1] -0.9747528
## 
## $p
## [1] 2
## 
## $q
## [1] 1
## 
## $phi
## [1]  1.4919089 -0.4948128
## 
## $theta
## [1] 0.7196951
## 
## $vara
## [1] 0.3482785
# ARIMA(p,d,q) where p=7. The exogenous variable is volume
# fit <- arima(df$low, order=c(5,0,0), xreg=df[,'volume'])

# Volume and the differences between high and low, open and closed seem the most reasonable
fit <- arima(df$low, order=c(2,0,1), xreg=cbind(df$volume, df$HiLo, df$OpClo))
fit
## 
## Call:
## arima(x = df$low, order = c(2, 0, 1), xreg = cbind(df$volume, df$HiLo, df$OpClo))
## 
## Coefficients:
##          ar1      ar2     ma1  intercept  cbind(df$volume, df$HiLo, df$OpClo)1
##       1.0088  -0.0211  0.3996    38.5574                                 5e-04
## s.e.  0.2685   0.2683  0.2533     2.9076                                 7e-04
##       cbind(df$volume, df$HiLo, df$OpClo)2
##                                    -0.5414
## s.e.                                0.0882
##       cbind(df$volume, df$HiLo, df$OpClo)3
##                                    -0.0253
## s.e.                                0.0531
## 
## sigma^2 estimated as 0.1068:  log likelihood = -32.36,  aic = 80.73
acf(fit$residuals)

ltest <- ljung.wge(fit$resid)
## Obs -0.02855528 -0.06968382 -0.153247 -0.2019444 -0.03428312 -0.05339646 0.115814 -0.05455948 0.001613383 0.02120805 -0.04531485 -0.002710999 -0.0490779 0.05246381 -0.04684995 0.06755038 0.1042888 -0.06139787 0.0001146488 0.001605157 0.06701037 -0.1378427 -0.1167967 -0.130531
ltest$pval # p-value is not significant; we cannot reject the null based on this test because the residuals are likely white noise
## [1] 0.6792181
# ASE for model with no lag and no trend (last 5)
df2 <- df[1:(nrow(df)-5),] # make a training dataset that excludes the most recent 5 points that we want to predict (our test data, or "holdout")
ksfit <- lm(low ~ volume + HiLo + OpClo, data=df2)
aic.wge(ksfit$residuals,p=0:8, q=0:5)  # AIC picks p=5 q=1, AIC picks p=7
## $type
## [1] "aic"
## 
## $value
## [1] -0.8857076
## 
## $p
## [1] 5
## 
## $q
## [1] 1
## 
## $phi
## [1] -0.02773821  0.73987954  0.03184284 -0.01327807  0.23973056
## 
## $theta
## [1] -0.8814089
## 
## $vara
## [1] 0.3559105
fit <- arima(df2$low, order=c(5,0,1), xreg=cbind(df2$volume, df2$HiLo, df2$OpClo))
fit
## 
## Call:
## arima(x = df2$low, order = c(5, 0, 1), xreg = cbind(df2$volume, df2$HiLo, df2$OpClo))
## 
## Coefficients:
##          ar1     ar2      ar3      ar4    ar5     ma1  intercept
##       0.7466  0.3085  -0.2500  -0.0094  0.194  0.7143    38.1898
## s.e.  0.2598  0.3794   0.2317   0.1649  0.107  0.2506     9.7318
##       cbind(df2$volume, df2$HiLo, df2$OpClo)1
##                                        0.0003
## s.e.                                   0.0017
##       cbind(df2$volume, df2$HiLo, df2$OpClo)2
##                                       -0.4603
## s.e.                                   0.0888
##       cbind(df2$volume, df2$HiLo, df2$OpClo)3
##                                       -0.0365
## s.e.                                   0.0513
## 
## sigma^2 estimated as 0.09723:  log likelihood = -26.6,  aic = 75.2
preds <- predict(fit, newxreg=cbind(df$volume[(nrow(df)-4):nrow(df)], df$HiLo[(nrow(df)-4):nrow(df)], df$OpClo[(nrow(df)-4):nrow(df)]))
ASE <- mean((df$low[(nrow(df)-4):nrow(df)] - preds$pred)^2)
ASE # 0.1251275
## [1] 0.1251275
plot(seq(1,nrow(df),1), df$low, type="b", xlab = "Trading Days", ylab = "Trade Prices", main = "Multiple Linear Regression without Trend", lwd=2)
points(seq((nrow(df) - 4), nrow(df), 1), preds$pred, type="b", pch=15, lwd=2)

df <- stock_data()
## Parsed with column specification:
## cols(
##   times = col_date(format = ""),
##   open = col_double(),
##   high = col_double(),
##   low = col_double(),
##   close = col_double(),
##   volume = col_double(),
##   symbol = col_character(),
##   market = col_character()
## )
t=1:nrow(df) # df has 100 rows. Creating a time vector with this for trending

# ccf(data.frame(df$volume, df$low, type="correlation")
ksfit <- lm(low ~ t + volume + HiLo + OpClo, data=df)

aic.wge(ksfit$residuals, p=0:10, q=0:5)  # AIC picks p=2, q=0 for ACGL
## $type
## [1] "aic"
## 
## $value
## [1] -2.268002
## 
## $p
## [1] 2
## 
## $q
## [1] 0
## 
## $phi
## [1]  1.1075085 -0.3546197
## 
## $theta
## [1] 0
## 
## $vara
## [1] 0.09749033
# Volume and the differences between high and low, open and closed seem the most reasonable
fit <- arima(df$low, order=c(2,1,0), xreg=cbind(t, df$volume, df$HiLo, df$OpClo)) # earlier analysis determined a difference is needed
fit
## 
## Call:
## arima(x = df$low, order = c(2, 1, 0), xreg = cbind(t, df$volume, df$HiLo, df$OpClo))
## 
## Coefficients:
##          ar1      ar2       t                         
##       0.3868  -0.2427  0.0743  5e-04  -0.5443  -0.0308
## s.e.  0.0994   0.0998  0.0404  6e-04   0.0861   0.0517
## 
## sigma^2 estimated as 0.1016:  log likelihood = -27.4,  aic = 68.8
acf(fit$residuals)

ltest <- ljung.wge(fit$resid)
## Obs -0.04406815 0.013187 -0.09641126 -0.2057866 -0.05742416 -0.09614599 0.09199866 -0.07652935 -0.005201798 0.02075405 -0.043928 0.003946746 -0.04769934 0.07715348 -0.0402132 0.07246384 0.09541723 -0.06112577 -0.005449058 -0.01273569 0.07542778 -0.1375788 -0.09979972 -0.111335
ltest$pval # p-value is not significant; we cannot reject the null based on this test because the residuals are likely white noise
## [1] 0.7775975
# ASE for model with no lag and no trend (last 5)
df2 <- df[1:(nrow(df)-5),] # make a training dataset that excludes the most recent 5 points that we want to predict (our test data, or "holdout")
t2 <- 1:(nrow(df)-5)
ksfit <- lm(df2$low ~ t2 + df2$volume + df2$HiLo + df2$OpClo + t2*df2$volume)
aic.wge(ksfit$residuals,p=0:8, q=0:5)  # AIC picks p=2 q=0
## $type
## [1] "aic"
## 
## $value
## [1] -2.182751
## 
## $p
## [1] 2
## 
## $q
## [1] 0
## 
## $phi
## [1]  1.054223 -0.311019
## 
## $theta
## [1] 0
## 
## $vara
## [1] 0.1058313
fit <- arima(df2$low, order=c(2,1,1), xreg=cbind(t2, df2$volume, df2$HiLo, df2$OpClo, t2*df2$volume))
fit
## 
## Call:
## arima(x = df2$low, order = c(2, 1, 1), xreg = cbind(t2, df2$volume, df2$HiLo, 
##     df2$OpClo, t2 * df2$volume))
## 
## Coefficients:
##          ar1      ar2     ma1      t2                             
##       1.2322  -0.4859  -1.000  0.0719  -0.001  -0.4714  -0.0610  0
## s.e.  0.0897   0.0925   0.051  0.0007   0.001   0.0895   0.0532  0
## 
## sigma^2 estimated as 0.08505:  log likelihood = -19.31,  aic = 56.63
preds <- predict(fit, newxreg=cbind(t[(nrow(df)-4):nrow(df)], df$volume[(nrow(df)-4):nrow(df)], df$HiLo[(nrow(df)-4):nrow(df)], df$OpClo[(nrow(df)-4):nrow(df)], t[(nrow(df)-4):nrow(df)]*df$volume[(nrow(df)-4):nrow(df)]))
ASE <- mean((df$low[(nrow(df)-4):nrow(df)] - preds$pred)^2)
ASE # 0.03534549
## [1] 0.03534549
plot(seq(1,nrow(df),1), df$low, type="b", xlab = "Trading Days", ylab = "Trade Prices", main = "Multiple Linear Regression with Trend and Volume Interaction", lwd=2)
points(seq((nrow(df) - 4), nrow(df), 1), preds$pred, type="b", pch=15)

df <- stock_data()
## Parsed with column specification:
## cols(
##   times = col_date(format = ""),
##   open = col_double(),
##   high = col_double(),
##   low = col_double(),
##   close = col_double(),
##   volume = col_double(),
##   symbol = col_character(),
##   market = col_character()
## )
t=1:nrow(df) # df has 100 rows. Creating a time vector with this for trending

# ccf(data.frame(df$volume, df$low, type="correlation")
ksfit <- lm(low ~ t + volume + HiLo + OpClo, data=df)

aic.wge(ksfit$residuals, p=0:10, q=0:5)  # AIC picks p=2, q=0 for ACGL
## $type
## [1] "aic"
## 
## $value
## [1] -2.268002
## 
## $p
## [1] 2
## 
## $q
## [1] 0
## 
## $phi
## [1]  1.1075085 -0.3546197
## 
## $theta
## [1] 0
## 
## $vara
## [1] 0.09749033
# Volume and the differences between high and low, open and closed seem the most reasonable
fit <- arima(df$low, order=c(2,1,0), xreg=cbind(t, df$volume, df$HiLo, df$OpClo)) # earlier analysis determined a difference is needed
fit
## 
## Call:
## arima(x = df$low, order = c(2, 1, 0), xreg = cbind(t, df$volume, df$HiLo, df$OpClo))
## 
## Coefficients:
##          ar1      ar2       t                         
##       0.3868  -0.2427  0.0743  5e-04  -0.5443  -0.0308
## s.e.  0.0994   0.0998  0.0404  6e-04   0.0861   0.0517
## 
## sigma^2 estimated as 0.1016:  log likelihood = -27.4,  aic = 68.8
acf(fit$residuals)

ltest <- ljung.wge(fit$resid)
## Obs -0.04406815 0.013187 -0.09641126 -0.2057866 -0.05742416 -0.09614599 0.09199866 -0.07652935 -0.005201798 0.02075405 -0.043928 0.003946746 -0.04769934 0.07715348 -0.0402132 0.07246384 0.09541723 -0.06112577 -0.005449058 -0.01273569 0.07542778 -0.1375788 -0.09979972 -0.111335
ltest$pval # p-value is not significant; we cannot reject the null based on this test because the residuals are likely white noise
## [1] 0.7775975
# ASE for model with no lag and no trend (last 5)
df2 <- df[1:(nrow(df)-5),] # make a training dataset that excludes the most recent 5 points that we want to predict (our test data, or "holdout")
t2 <- 1:(nrow(df)-5)
ksfit <- lm(df2$low ~ t2 + df2$volume + df2$HiLo + df2$OpClo + t2*df2$volume)
aic.wge(ksfit$residuals,p=0:8, q=0:5)  # AIC picks p=2 q=0
## $type
## [1] "aic"
## 
## $value
## [1] -2.182751
## 
## $p
## [1] 2
## 
## $q
## [1] 0
## 
## $phi
## [1]  1.054223 -0.311019
## 
## $theta
## [1] 0
## 
## $vara
## [1] 0.1058313
fit <- arima(df2$low, order=c(2,1,1), xreg=cbind(t2, df2$volume, df2$HiLo, df2$OpClo, t2*df2$volume, t2*df2$OpClo))
fit
## 
## Call:
## arima(x = df2$low, order = c(2, 1, 1), xreg = cbind(t2, df2$volume, df2$HiLo, 
##     df2$OpClo, t2 * df2$volume, t2 * df2$OpClo))
## 
## Coefficients:
##          ar1      ar2     ma1      t2                                      
##       1.2317  -0.4849  -1.000  0.0716  -0.0011  -0.4742  -0.1017  0  0.0008
## s.e.  0.0898   0.0926   0.051  0.0007   0.0010   0.0897   0.1261  0  0.0023
## 
## sigma^2 estimated as 0.08494:  log likelihood = -19.25,  aic = 58.51
preds <- predict(fit, newxreg=cbind(t[(nrow(df)-4):nrow(df)], df$volume[(nrow(df)-4):nrow(df)], df$HiLo[(nrow(df)-4):nrow(df)], df$OpClo[(nrow(df)-4):nrow(df)], t[(nrow(df)-4):nrow(df)]*df$volume[(nrow(df)-4):nrow(df)], t[(nrow(df)-4):nrow(df)]*df$OpClo[(nrow(df)-4):nrow(df)]))
ASE <- mean((df$low[(nrow(df)-4):nrow(df)] - preds$pred)^2)
ASE # 0.03452872
## [1] 0.03452872
plot(seq(1,nrow(df),1), df$low, type="b", xlab = "Trading Days", ylab = "Trade Prices", main = "Multiple Linear Regression with Trend and Volume Interaction", lwd=2)
points(seq((nrow(df) - 4), nrow(df), 1), preds$pred, type="b", pch=15)

# Vector Autoregressive Modeling

Vector Autoregressive Modeling without Trend

df <- stock_data()
## Parsed with column specification:
## cols(
##   times = col_date(format = ""),
##   open = col_double(),
##   high = col_double(),
##   low = col_double(),
##   close = col_double(),
##   volume = col_double(),
##   symbol = col_character(),
##   market = col_character()
## )
plotts.sample.wge(df$volume, arlimits=T)

## $autplt
##  [1]  1.000000000  0.332654213 -0.015323504 -0.185197647  0.005818653
##  [6]  0.124607489  0.059217024 -0.058842730 -0.135327419 -0.128903104
## [11] -0.111554245 -0.136230829 -0.177090893 -0.102374463 -0.010295688
## [16] -0.012154962 -0.091622722 -0.032089912 -0.074665693 -0.032921963
## [21] -0.004581175  0.213392652  0.211975490  0.061996282 -0.059048222
## [26] -0.052510741
## 
## $freq
##  [1] 0.01 0.02 0.03 0.04 0.05 0.06 0.07 0.08 0.09 0.10 0.11 0.12 0.13 0.14 0.15
## [16] 0.16 0.17 0.18 0.19 0.20 0.21 0.22 0.23 0.24 0.25 0.26 0.27 0.28 0.29 0.30
## [31] 0.31 0.32 0.33 0.34 0.35 0.36 0.37 0.38 0.39 0.40 0.41 0.42 0.43 0.44 0.45
## [46] 0.46 0.47 0.48 0.49 0.50
## 
## $db
##  [1]  -7.38499578  -0.95161030   3.39895369   3.13204413   7.25816141
##  [6]  -3.34022975   1.45696680  -3.16420812   0.98653170  -0.01510512
## [11]   0.74899492  -3.54602921  -7.04898470   6.69701287   0.21917900
## [16]   3.26040620  -2.30071218   5.83627330   0.12105751   4.40156709
## [21]   1.31507769   1.49404141   4.21104386  -7.07865344 -14.93552255
## [26]  -1.90529423  -9.05704419   1.17291691  -2.44739663 -15.18817408
## [31]  -3.82537165  -8.97778373  -1.31847009  -8.52554281  -9.96514339
## [36]  -0.84871304  -2.04248679  -9.19693386  -5.37219836  -6.41835850
## [41]  -0.84106671 -15.69187726   0.78348200  -7.78153910 -19.81844091
## [46]  -1.92362217  -0.03109830  -1.67305303  -6.34494742  -2.36851032
## 
## $dbz
##  [1]  0.7617948  1.3396040  1.9342166  2.3252670  2.4227279  2.2272880
##  [7]  1.8074853  1.2989121  0.8916181  0.7572102  0.9377718  1.3279105
## [13]  1.7766211  2.1767262  2.4818412  2.6857904  2.7968223  2.8178953
## [19]  2.7371907  2.5301127  2.1698520  1.6414415  0.9557012  0.1601924
## [25] -0.6606149 -1.4101839 -2.0301647 -2.5352613 -2.9884786 -3.4384161
## [31] -3.8748905 -4.2313776 -4.4336135 -4.4602264 -4.3559544 -4.1882068
## [37] -4.0026903 -3.8148228 -3.6258049 -3.4384625 -3.2600843 -3.0965714
## [43] -2.9492483 -2.8191207 -2.7129243 -2.6423490 -2.6148787 -2.6237376
## [49] -2.6467693 -2.6582701
ccf(df$volume,df$low, type="correlation")
ccf(df$HiLo,df$low, type="correlation")
ccf(df$OpClo,df$low, type="correlation")

df2 <- df[1:(nrow(df)-5),]
VARselect(df2$low, lag.max=6, type="const", season=NULL, exogen=data.frame(df2$volume, df2$HiLo, df2$OpClo)) # AIC picked p=1 with AIC -2.3235
## $selection
## AIC(n)  HQ(n)  SC(n) FPE(n) 
##      5      2      2      5 
## 
## $criteria
##                  1           2           3           4          5           6
## AIC(n) -2.32345681 -2.42117979 -2.41376593 -2.40362716 -2.4365656 -2.42032626
## HQ(n)  -2.26710299 -2.35355521 -2.33487058 -2.31346105 -2.3351287 -2.30761863
## SC(n)  -2.18364578 -2.25340655 -2.21803048 -2.17992951 -2.1849057 -2.14070419
## FPE(n)  0.09794606  0.08883496  0.08950683  0.09043349  0.0875214  0.08897736
lsfit <- VAR(y=cbind(df2$low, df2$volume, df2$HiLo, df2$OpClo), p=1, type="const") # with p=5, this will estimate the coefficients for lag 5
preds <- predict(lsfit, n.ahead=5)

ASE <- mean((df$low[(nrow(df)-4):nrow(df)] - preds$fcst$y1[,1])^2)
ASE # 0.2383062
## [1] 0.2383062
plot(seq(1,100,1), df$low[1:100], type = "l", xlim = c(0,100), xlab = "Trading Days", ylab = "Trade Prices", main = "Vector Autoregressive Modeling without Trend", lwd=2)
lines(seq(96,100,1), preds$fcst$y1[,1], type = "l", col = "red", lwd=2)

## Vector Autoregressive Modeling with Trend

df <- stock_data()
## Parsed with column specification:
## cols(
##   times = col_date(format = ""),
##   open = col_double(),
##   high = col_double(),
##   low = col_double(),
##   close = col_double(),
##   volume = col_double(),
##   symbol = col_character(),
##   market = col_character()
## )
plotts.sample.wge(df$volume, arlimits=T)

## $autplt
##  [1]  1.000000000  0.332654213 -0.015323504 -0.185197647  0.005818653
##  [6]  0.124607489  0.059217024 -0.058842730 -0.135327419 -0.128903104
## [11] -0.111554245 -0.136230829 -0.177090893 -0.102374463 -0.010295688
## [16] -0.012154962 -0.091622722 -0.032089912 -0.074665693 -0.032921963
## [21] -0.004581175  0.213392652  0.211975490  0.061996282 -0.059048222
## [26] -0.052510741
## 
## $freq
##  [1] 0.01 0.02 0.03 0.04 0.05 0.06 0.07 0.08 0.09 0.10 0.11 0.12 0.13 0.14 0.15
## [16] 0.16 0.17 0.18 0.19 0.20 0.21 0.22 0.23 0.24 0.25 0.26 0.27 0.28 0.29 0.30
## [31] 0.31 0.32 0.33 0.34 0.35 0.36 0.37 0.38 0.39 0.40 0.41 0.42 0.43 0.44 0.45
## [46] 0.46 0.47 0.48 0.49 0.50
## 
## $db
##  [1]  -7.38499578  -0.95161030   3.39895369   3.13204413   7.25816141
##  [6]  -3.34022975   1.45696680  -3.16420812   0.98653170  -0.01510512
## [11]   0.74899492  -3.54602921  -7.04898470   6.69701287   0.21917900
## [16]   3.26040620  -2.30071218   5.83627330   0.12105751   4.40156709
## [21]   1.31507769   1.49404141   4.21104386  -7.07865344 -14.93552255
## [26]  -1.90529423  -9.05704419   1.17291691  -2.44739663 -15.18817408
## [31]  -3.82537165  -8.97778373  -1.31847009  -8.52554281  -9.96514339
## [36]  -0.84871304  -2.04248679  -9.19693386  -5.37219836  -6.41835850
## [41]  -0.84106671 -15.69187726   0.78348200  -7.78153910 -19.81844091
## [46]  -1.92362217  -0.03109830  -1.67305303  -6.34494742  -2.36851032
## 
## $dbz
##  [1]  0.7617948  1.3396040  1.9342166  2.3252670  2.4227279  2.2272880
##  [7]  1.8074853  1.2989121  0.8916181  0.7572102  0.9377718  1.3279105
## [13]  1.7766211  2.1767262  2.4818412  2.6857904  2.7968223  2.8178953
## [19]  2.7371907  2.5301127  2.1698520  1.6414415  0.9557012  0.1601924
## [25] -0.6606149 -1.4101839 -2.0301647 -2.5352613 -2.9884786 -3.4384161
## [31] -3.8748905 -4.2313776 -4.4336135 -4.4602264 -4.3559544 -4.1882068
## [37] -4.0026903 -3.8148228 -3.6258049 -3.4384625 -3.2600843 -3.0965714
## [43] -2.9492483 -2.8191207 -2.7129243 -2.6423490 -2.6148787 -2.6237376
## [49] -2.6467693 -2.6582701
ccf(df$volume,df$low, type="correlation")
ccf(df$HiLo,df$low, type="correlation")
ccf(df$OpClo,df$low, type="correlation")

df2 <- df[1:(nrow(df)-5),]
t <- 1:100
# Because of the white noise cross-correlation between all exogenous variables and the response variable, it's reasonable to expect only an AR(1) from VARselect. The model will quickly approximate the signal's mean without much sensitivity to change.
VARselect(df2$low, lag.max=6, type="const", season=NULL, exogen=data.frame(t[96:100], df2$volume, df2$HiLo, df2$OpClo)) # AIC:  p=1 with AIC -2.3158
## $selection
## AIC(n)  HQ(n)  SC(n) FPE(n) 
##      5      2      2      5 
## 
## $criteria
##                  1           2           3          4           5           6
## AIC(n) -2.31575921 -2.40748095 -2.39878469 -2.3871149 -2.42434136 -2.40794490
## HQ(n)  -2.24813464 -2.32858561 -2.30861859 -2.2856780 -2.31163373 -2.28396651
## SC(n)  -2.14798598 -2.21174551 -2.17508704 -2.1354550 -2.14471929 -2.10036063
## FPE(n)  0.09871144  0.09007115  0.09087247  0.0919582  0.08862082  0.09011447
lsfit <- VAR(y=cbind(df2$low, t[1:95], df2$volume, df2$HiLo, df2$OpClo), p=1, type="const") # with p=1, this will estimate the coefficients for lag 1
preds <- predict(lsfit, n.ahead=5)

ASE <- mean((df$low[(nrow(df)-4):nrow(df)] - preds$fcst$y1[,1])^2)
ASE # 0.1132188
## [1] 0.1132188
plot(seq(1,100,1), df$low[1:100], type = "l",xlim = c(0,100), xlab = "Trading Days", ylab = "Trade Prices", main = "Vector Autoregressive Modeling with Trend", lwd=2)
lines(seq(96,100,1), preds$fcst$y1[,1], type = "l", col = "red", lwd=2)

Vector Autoregressive Modeling with Trend, Time and Volume Interaction

df <- stock_data()
## Parsed with column specification:
## cols(
##   times = col_date(format = ""),
##   open = col_double(),
##   high = col_double(),
##   low = col_double(),
##   close = col_double(),
##   volume = col_double(),
##   symbol = col_character(),
##   market = col_character()
## )
plotts.sample.wge(df$volume, arlimits=T)

## $autplt
##  [1]  1.000000000  0.332654213 -0.015323504 -0.185197647  0.005818653
##  [6]  0.124607489  0.059217024 -0.058842730 -0.135327419 -0.128903104
## [11] -0.111554245 -0.136230829 -0.177090893 -0.102374463 -0.010295688
## [16] -0.012154962 -0.091622722 -0.032089912 -0.074665693 -0.032921963
## [21] -0.004581175  0.213392652  0.211975490  0.061996282 -0.059048222
## [26] -0.052510741
## 
## $freq
##  [1] 0.01 0.02 0.03 0.04 0.05 0.06 0.07 0.08 0.09 0.10 0.11 0.12 0.13 0.14 0.15
## [16] 0.16 0.17 0.18 0.19 0.20 0.21 0.22 0.23 0.24 0.25 0.26 0.27 0.28 0.29 0.30
## [31] 0.31 0.32 0.33 0.34 0.35 0.36 0.37 0.38 0.39 0.40 0.41 0.42 0.43 0.44 0.45
## [46] 0.46 0.47 0.48 0.49 0.50
## 
## $db
##  [1]  -7.38499578  -0.95161030   3.39895369   3.13204413   7.25816141
##  [6]  -3.34022975   1.45696680  -3.16420812   0.98653170  -0.01510512
## [11]   0.74899492  -3.54602921  -7.04898470   6.69701287   0.21917900
## [16]   3.26040620  -2.30071218   5.83627330   0.12105751   4.40156709
## [21]   1.31507769   1.49404141   4.21104386  -7.07865344 -14.93552255
## [26]  -1.90529423  -9.05704419   1.17291691  -2.44739663 -15.18817408
## [31]  -3.82537165  -8.97778373  -1.31847009  -8.52554281  -9.96514339
## [36]  -0.84871304  -2.04248679  -9.19693386  -5.37219836  -6.41835850
## [41]  -0.84106671 -15.69187726   0.78348200  -7.78153910 -19.81844091
## [46]  -1.92362217  -0.03109830  -1.67305303  -6.34494742  -2.36851032
## 
## $dbz
##  [1]  0.7617948  1.3396040  1.9342166  2.3252670  2.4227279  2.2272880
##  [7]  1.8074853  1.2989121  0.8916181  0.7572102  0.9377718  1.3279105
## [13]  1.7766211  2.1767262  2.4818412  2.6857904  2.7968223  2.8178953
## [19]  2.7371907  2.5301127  2.1698520  1.6414415  0.9557012  0.1601924
## [25] -0.6606149 -1.4101839 -2.0301647 -2.5352613 -2.9884786 -3.4384161
## [31] -3.8748905 -4.2313776 -4.4336135 -4.4602264 -4.3559544 -4.1882068
## [37] -4.0026903 -3.8148228 -3.6258049 -3.4384625 -3.2600843 -3.0965714
## [43] -2.9492483 -2.8191207 -2.7129243 -2.6423490 -2.6148787 -2.6237376
## [49] -2.6467693 -2.6582701
ccf(df$volume,df$low, type="correlation")
ccf(df$HiLo,df$low, type="correlation")
ccf(df$OpClo,df$low, type="correlation")

df2 <- df[1:(nrow(df)-5),]
t <- 1:100
# Because of the white noise cross-correlation between all exogenous variables and the response variable, it's reasonable to expect only an AR(1) from VARselect. The model will quickly approximate the signal's mean without much sensitivity to change.
VARselect(df2$low, lag.max=6, type="const", season=NULL, exogen=data.frame(t[96:100], df2$volume, df2$HiLo, df2$OpClo, t[96:100]*df2$volume)) # AIC:  p=1 with AIC -2.3158
## $selection
## AIC(n)  HQ(n)  SC(n) FPE(n) 
##      5      2      2      5 
## 
## $criteria
##                  1           2           3          4          5           6
## AIC(n) -2.34997094 -2.42927913 -2.41971843 -2.4064194 -2.4400669 -2.42758273
## HQ(n)  -2.27107560 -2.33911303 -2.31828156 -2.2937117 -2.3160886 -2.29233357
## SC(n)  -2.15423549 -2.20558148 -2.16805857 -2.1267973 -2.1324827 -2.09203625
## FPE(n)  0.09540299  0.08814319  0.08900838  0.0902234  0.0872658  0.08839582
lsfit <- VAR(y=cbind(df2$low, t[1:95], df2$volume, df2$HiLo, df2$OpClo, t[96:100]*df2$volume), p=1, type="const") # with p=1, this will estimate the coefficients for lag 1
preds <- predict(lsfit, n.ahead=5)

ASE <- mean((df$low[(nrow(df)-4):nrow(df)] - preds$fcst$y1[,1])^2)
ASE #0.1087767 with time and volume interaction. 0.1132188 with only time and no interaction
## [1] 0.1087767
plot(seq(1,100,1), df$low[1:100], type = "l",xlim = c(0,100), xlab = "Trading Days", ylab = "Trade Prices", main = "Vector Autoregressive Modeling with Trend", lwd=2)
lines(seq(96,100,1), preds$fcst$y1[,1], type = "l", col = "red", lwd=2)

Vector Autoregressive Modeling with Trend, Time and Volume Interaction, Time and OpClo Interaction

It is important to note that we did not identify an interaction between time and HiLo to benefit the model, but we did find that interaction with OpClo and time to be beneficial for reducing ASE.

df <- stock_data()
## Parsed with column specification:
## cols(
##   times = col_date(format = ""),
##   open = col_double(),
##   high = col_double(),
##   low = col_double(),
##   close = col_double(),
##   volume = col_double(),
##   symbol = col_character(),
##   market = col_character()
## )
plotts.sample.wge(df$volume, arlimits=T)

## $autplt
##  [1]  1.000000000  0.332654213 -0.015323504 -0.185197647  0.005818653
##  [6]  0.124607489  0.059217024 -0.058842730 -0.135327419 -0.128903104
## [11] -0.111554245 -0.136230829 -0.177090893 -0.102374463 -0.010295688
## [16] -0.012154962 -0.091622722 -0.032089912 -0.074665693 -0.032921963
## [21] -0.004581175  0.213392652  0.211975490  0.061996282 -0.059048222
## [26] -0.052510741
## 
## $freq
##  [1] 0.01 0.02 0.03 0.04 0.05 0.06 0.07 0.08 0.09 0.10 0.11 0.12 0.13 0.14 0.15
## [16] 0.16 0.17 0.18 0.19 0.20 0.21 0.22 0.23 0.24 0.25 0.26 0.27 0.28 0.29 0.30
## [31] 0.31 0.32 0.33 0.34 0.35 0.36 0.37 0.38 0.39 0.40 0.41 0.42 0.43 0.44 0.45
## [46] 0.46 0.47 0.48 0.49 0.50
## 
## $db
##  [1]  -7.38499578  -0.95161030   3.39895369   3.13204413   7.25816141
##  [6]  -3.34022975   1.45696680  -3.16420812   0.98653170  -0.01510512
## [11]   0.74899492  -3.54602921  -7.04898470   6.69701287   0.21917900
## [16]   3.26040620  -2.30071218   5.83627330   0.12105751   4.40156709
## [21]   1.31507769   1.49404141   4.21104386  -7.07865344 -14.93552255
## [26]  -1.90529423  -9.05704419   1.17291691  -2.44739663 -15.18817408
## [31]  -3.82537165  -8.97778373  -1.31847009  -8.52554281  -9.96514339
## [36]  -0.84871304  -2.04248679  -9.19693386  -5.37219836  -6.41835850
## [41]  -0.84106671 -15.69187726   0.78348200  -7.78153910 -19.81844091
## [46]  -1.92362217  -0.03109830  -1.67305303  -6.34494742  -2.36851032
## 
## $dbz
##  [1]  0.7617948  1.3396040  1.9342166  2.3252670  2.4227279  2.2272880
##  [7]  1.8074853  1.2989121  0.8916181  0.7572102  0.9377718  1.3279105
## [13]  1.7766211  2.1767262  2.4818412  2.6857904  2.7968223  2.8178953
## [19]  2.7371907  2.5301127  2.1698520  1.6414415  0.9557012  0.1601924
## [25] -0.6606149 -1.4101839 -2.0301647 -2.5352613 -2.9884786 -3.4384161
## [31] -3.8748905 -4.2313776 -4.4336135 -4.4602264 -4.3559544 -4.1882068
## [37] -4.0026903 -3.8148228 -3.6258049 -3.4384625 -3.2600843 -3.0965714
## [43] -2.9492483 -2.8191207 -2.7129243 -2.6423490 -2.6148787 -2.6237376
## [49] -2.6467693 -2.6582701
ccf(df$volume,df$low, type="correlation")
ccf(df$HiLo,df$low, type="correlation")
ccf(df$OpClo,df$low, type="correlation")

df2 <- df[1:(nrow(df)-5),]
t <- 1:100
# Because of the white noise cross-correlation between all exogenous variables and the response variable, it's reasonable to expect only an AR(1) from VARselect. The model will quickly approximate the signal's mean without much sensitivity to change.
VARselect(df2$low, lag.max=6, type="const", season=NULL, exogen=data.frame(t[96:100], df2$volume, df2$HiLo, df2$OpClo, t[96:100]*df2$volume, t[96:100]*df2$OpClo)) # AIC:  p=1 with AIC -2.3158
## $selection
## AIC(n)  HQ(n)  SC(n) FPE(n) 
##      5      2      2      5 
## 
## $criteria
##                  1           2           3           4           5           6
## AIC(n) -2.32768937 -2.40708908 -2.39749839 -2.38397323 -2.41776285 -2.40540814
## HQ(n)  -2.23752327 -2.30565222 -2.28479076 -2.25999484 -2.28251369 -2.25888822
## SC(n)  -2.10399172 -2.15542922 -2.11787633 -2.07638896 -2.08221637 -2.04189945
## FPE(n)  0.09756828  0.09013962  0.09103188  0.09230076  0.08926814  0.09041876
lsfit <- VAR(y=cbind(df2$low, t[1:95], df2$volume, df2$HiLo, df2$OpClo, t[96:100]*df2$volume, t[96:100]*df2$OpClo), p=1, type="const") # with p=1, this will estimate the coefficients for lag 1
preds <- predict(lsfit, n.ahead=5)

ASE <- mean((df$low[(nrow(df)-4):nrow(df)] - preds$fcst$y1[,1])^2)
ASE # 0.1010439
## [1] 0.1010439
plot(seq(1,100,1), df$low[1:100], type = "l",xlim = c(0,100), xlab = "Trading Days", ylab = "Trade Prices", main = "Vector Autoregressive Modeling with Trend", lwd=2)
lines(seq(96,100,1), preds$fcst$y1[,1], type = "l", col = "red", lwd=2)

Neural Network Models

Neural Network without Trend

df <- stock_data()

dfTrain <- df[1:(nrow(df)-5),]
lowTest <- df$low[(nrow(df)-4):nrow(df)] # forecast actuals
lowTrain <- ts(dfTrain$low) # a training set of the target minus the test forecast
volumeFull <- ts(df$volume) # the full dataset (train + test) of xreg
HiLoFull <- ts(df$HiLo) # the full dataset (train + test) of xreg
OpCloFull <- ts(df$OpClo) # the full dataset (train + test) of xreg

fit.mlp <- mlp(lowTrain, xreg=data.frame(volumeFull, HiLoFull, OpCloFull), reps = 15, comb = "mean", hd.auto.type="cv")
fit.mlp
plot(fit.mlp)

fore.mlp <- forecast(fit.mlp, h=5, xreg=data.frame(volumeFull, HiLoFull, OpCloFull))
plot(fore.mlp)
ASE = mean((lowTest - fore.mlp$mean)^2)
ASE # 0.1758194

Neural Network with Trend

df <- stock_data()
t <- 1:nrow(df)

dfTrain <- df[1:(nrow(df)-5),]
lowTest <- df$low[(nrow(df)-4):nrow(df)] # forecast actuals
lowTrain <- ts(dfTrain$low) # a training set of the target minus the test forecast
volumeFull <- ts(df$volume) # the full dataset (train + test) of xreg
HiLoFull <- ts(df$HiLo) # the full dataset (train + test) of xreg
OpCloFull <- ts(df$OpClo) # the full dataset (train + test) of xreg

fit.mlp <- mlp(lowTrain, xreg=data.frame(t, volumeFull, HiLoFull, OpCloFull), reps = 15, comb = "mean", hd.auto.type="cv")
fit.mlp
plot(fit.mlp)

fore.mlp <- forecast(fit.mlp, h=5, xreg=data.frame(t, volumeFull, HiLoFull, OpCloFull))
plot(fore.mlp)
ASE = mean((lowTest - fore.mlp$mean)^2)
ASE # 0.06421939

Neural Network with Trend, Time and Volume Interaction

df <- stock_data()
t <- 1:nrow(df)

dfTrain <- df[1:(nrow(df)-5),]
lowTest <- df$low[(nrow(df)-4):nrow(df)] # forecast actuals
lowTrain <- ts(dfTrain$low) # a training set of the target minus the test forecast
volumeFull <- ts(df$volume) # the full dataset (train + test) of xreg
HiLoFull <- ts(df$HiLo) # the full dataset (train + test) of xreg
OpCloFull <- ts(df$OpClo) # the full dataset (train + test) of xreg

fit.mlp <- mlp(lowTrain, xreg=data.frame(t, volumeFull, HiLoFull, OpCloFull, t*volumeFull), reps = 15, comb = "mean", hd.auto.type="cv")
fit.mlp
plot(fit.mlp)

fore.mlp <- forecast(fit.mlp, h=5, xreg=data.frame(t, volumeFull, HiLoFull, OpCloFull, t*volumeFull))
plot(fore.mlp)
ASE = mean((lowTest - fore.mlp$mean)^2)
ASE # 0.05486796

Neural Network with Trend, Time and Volume Interaction, Time and OpClo Interaction

It is important to note that we did not identify an interaction between time and HiLo to benefit the model, but we did find that interaction with OpClo and time to be beneficial for reducing ASE.

df <- stock_data()
t <- 1:nrow(df)

dfTrain <- df[1:(nrow(df)-5),]
lowTest <- df$low[(nrow(df)-4):nrow(df)] # forecast actuals
lowTrain <- ts(dfTrain$low) # a training set of the target minus the test forecast
volumeFull <- ts(df$volume) # the full dataset (train + test) of xreg
HiLoFull <- ts(df$HiLo) # the full dataset (train + test) of xreg
OpCloFull <- ts(df$OpClo) # the full dataset (train + test) of xreg

fit.mlp <- mlp(lowTrain, xreg=data.frame(t, volumeFull, HiLoFull, OpCloFull, t*volumeFull, t*OpCloFull), reps = 15, comb = "mean", hd.auto.type="cv")
fit.mlp
plot(fit.mlp)

fore.mlp <- forecast(fit.mlp, h=5, xreg=data.frame(t, volumeFull, HiLoFull, OpCloFull, t*volumeFull, t*OpCloFull))
plot(fore.mlp)
ASE = mean((lowTest - fore.mlp$mean)^2)
ASE # 0.05682376